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, We address the problem of detecting race conditions in programs that use semaphores for 

' synchronization. Netzer and Miller showed that it is NP-complcte to detect race conditions in 

programs that use many semaphores. We show in this paper that it remains NP-complete even 
if only two semaphores are used in the parallel programs. 
. For the tractable case, i.e., using only one semaphore, we give two algorithms for detect- 

(-H ' ing race conditions from the trace of executing a parallel program on p processors, where n 

c/3 , semaphore operations are executed. The first algorithm determines in 0{n) time whether a race 

^ ' condition exists between any two given operations. The second algorithm runs in 0{np\ogn) 

time and outputs a compact representation from which one can determine in 0(1) time whether 
a race condition exists between any two given operations. The second algorithm is near-optimal 
^ , in that the running time is only O(logn) times the time required simply to write down the 

output. 



o 
o 

O ' 1 Introduction 

^ I Race detection is crucial in developing and debugging shared-memory parallel programs ||5|, 0, |ll|, 

18 1 . Explicit synchronization is usually added to such programs to coordinate access to shared data. 



For example, when using a semaphore, a ^-operation increments the semaphore, and a P-operation 
waits until the semaphore is greater than zero and then decrements the semaphore. P-operations are 
^ I typically used to wait (synchronize) until some condition is true (such as a shared buffer becoming 

■ non-empty), and ^-operations typically signal that some condition is now true. Race conditions 

result when this synchronization does not force concurrent processes to access data in the expected 
order. One way to dynamically detect races in a program is to trace its execution and analyze 
the traces afterward. A central part of dynamic race detection is to compute from the trace the 
order in which shared-memory accesses were guaranteed by the execution's synchronization to have 
executed. Accesses to the same location not guaranteed to execute in some particular order are 
considered a race. When programs use semaphore operations for synchronization, some operations 
(belonging to different processes) could have potentially executed in an order different than what 
was traced. 
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In this paper, we address the tractabihty of detecting race conditions from the traces of parallel 
programs that use semaphores. Let p be the number of processors used to execute the parallel 
program, and let n be the total number of semaphore operations performed in the execution. The 
trace can then be represented by a directed n-node graph G consisting of p disjoint chains, each 
represents the sequence of semaphore operations executed by a processor. A schedule of G is a 
linear ordering of all nodes in G consistent with the precedence constraints imposed by the arcs of 
G. A prefix of a schedule of G is a subschedule of G. A subschedule of G is valid if at each point 
in the subschedule, the number of V operations is never exceeded by the number of P operations 
for each semaphore (i.e., all semaphores are always nonnegative) . Then, if the trace indicates that 

V preceded w in the actual execution, but a valid subschedule^ exists in which w precedes v, then 

V and w could have executed in either order, i.e., there is a race condition between v and w. Miller 
and Netzer showed that detecting race conditions in parallel programs that use multiple semaphores 
is NP-complete |jl5|. Researchers have developed exact algorithms for cases where the problem is 
efficiently solvable (programs that use types of synchronization weaker than semaphores such as 
post /wait/clear) and heuristics for the multiple semaphore case @, 0- The complexity 
for the case of constant number of semaphores was unknown. In the present paper, we show that 
the problem remains NP-complete even if only two semaphores are used in the parallel program. 

For the case of using only one semaphore in parallel programs, we give two algorithms. The 
first algorithm detects in 0(n) time whether a race condition exists between any two operations. 
The second algorithm computes in 0{nplogn) time a compact representation, from which one can 
determine whether a race condition exists between any two operations in 0(1) time. Our results 
are based on the reducing the problem of determining whether a valid subschedule exists in which w 
precedes v to the problem of Sequencing to Minimize Maximum Cumulative Cost (SMMCC). Given 
an acyclic directed graph G with costs on the nodes, the cumulative cost of the first i nodes in a 
schedule of G is the sum of the cost of these nodes. Thus, minimizing the maximum cumulative cost 
is an attempt to ensure that the cumulative cost stays low throughout the schedule. The SMMCC 
problem is NP-complete in general even if the node costs are restricted to ±1 Abdel-Wahab 
and Kameda ^ presented an 0(n^ )-time algorithm for the special case that G is a series-parallel 
graph. (The time bound was later improved to O(nlogn) by the same authors [^.) As part of this 
solution, they gave an 0(n log p)-time algorithm applicable when G consists of p disjoint chains. 
The existence problem of a valid schedule in which v precedes w can be reduced to the SMMCC 
problem in a chain graph augmented with one inter-chain edge. We add an edge from w to f , 
assign costs to the nodes (-1-1 if the node is a P-operation, —1 if a ^-operation), and compute the 
minimum maximum cumulative cost. Clearly, the cost is non-positive if and only if there is a valid 
schedule. The augmented chain graph is not series-parallel, so the algorithms of Abdel-Wahab and 
Kameda |2[^ are not applicable. We show that the SMMCC problem can nevertheless be solved 
in polynomial time. In fact, for the special case of interest, that in which the costs are ±1, we give 
a linear-time algorithm. 

The rest of the paper is organized as follows. Section ^ gives the preliminaries. Section ^ gives 
the algorithm for a single pair of nodes. Section ^ gives the algorithm for all pairs of nodes. Section ^ 
sketches the proof for showing that race-condition detection is NP-complete if two semaphores are 
used in the parallel program. 



^We consider subschedules rather than schedules because deadlocks might happen during the execution of parallel 
programs. 
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Figure 1: A hump H of 12 nodes: vi,V2, ■ ■ ■ , f 12- The cost of each node is in the circle. By definition 
c{H) = —2, h{H) = 2, and h{H) = 4. Both of V2 and vg are peaks of H, but only V2 is useful. 

2 Preliminaries 

Suppose G is an acyclic graph with node costs. We introduce some terminology having to do with 
schedules, mostly adapted from [^. A segment of a schedule is a consecutive subsequence. Let 
H = V1V2 • • • be a sequence of nodes. The cost of H, denoted c(ff), is the sum of the costs of its 
nodes. The height of a node V£ in H is defined to be the sum of the costs of the nodes vi through 
Vi- The height of H, denoted h{H), is the maximum of and the maximum height of the nodes in 
H. A node of maximum height in H is called a peak. A node of minimum height in H is called a 
valley. The reverse height of H, denote h{H), is the height of H minus the cost of H. Note that 
height and reverse height are nonnegative. A schedule of G is optimal if its height is minimum over 
all schedules of G. We use h{G) to denote the height of its optimal schedule. 

A sequence C = V1V2 ■ • • Vm oi nodes of G is called a chain of G if the only edges in G incident 
on these nodes are voVi,viV2, . . . ,Vm-iVm,'VmVm+i, where vq and Vm+i are other nodes, denoted 
pred{C) and succ{G), respectively. We use start{C) to denote vi and end{C) to denote Vm- Note 
that C could be a single node. 

We use to denote the chain of G starting from v and ending at w. Let [v, —]g denote 

the longest chain of G starting from v, and [— , v\g the longest chain of G ending at v. If it is clear 
from the context which graph is intended, then we may omit the subscript G. Note that the above 
notation might not be well-defined for any acyclic graph G, but it is so when G is composed of 
disjoint chains, which is the case of interest in this paper. 

Suppose H \s a, chain of G containing a peak vi such that (1) every node of H preceding vi 
has nonnegative height in H, and (2) every node of H following has height in H at least the 
cost of H. In this case, we call H a hump, and we say vi is a useful peak of H. This definition is 
illustrated in Figure ||. We say a hump is an N-hump if its cost is negative, a P-hump if its cost is 
nonnegative. 

We are concerned primarily with graphs G consisting of disjoint chains Ci, C2, . . . , Cp. For 
convenience, we assume that G contains an initial pseudonode (_L), preceding all nodes, and a 
terminal pseudonode (T), following all nodes, each of cost zero. Thus, pred{v) could be _L and 
succ{v) could be T. 

For the rest of the section we describe the properties of humps in schedules, mostly adapted 
from 

2.1 Hump Decomposition 

As part of their scheduling algorithm for series-parallel graphs, Abdel-Wahab and Kameda |^] show 
that in linear time a sequence of nodes can be decomposed into a set of humps by an algorithm 
Decomp(). It takes a chain as input and outputs a set of disjoint subchains such that every 
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Figure 2: A chain decomposed into two A^- humps and three P- humps. 




Figure 3: The second sequence of nodes is obtained from the first one by clustering the nodes 1 5 

to node 3. 

subchain is a hump. The output of Decomp(C) is unique, although the output is not necessarily 
the only hump decomposition of C. An example is shown in Figure |2|. The chain is decomposed 
by Decomp() into two A^-humps and three P-humps. For a chain C, we say H \s a. hump of C if 
H G Decomp(C). It can be proved that Decomp() has the following properties. 

Hump-decomposition properties: 

1. Suppose Hi,H2 G Decomp(C) and Hi precedes H2 in C. If c{Hi) > 0, then c{H2) > and 
h{Hi) > h{H2). If c{H2) < 0, then c{Hi) < and h{Hi) < h{H2). 

2. If V is the first valley of [u, w], then DECOMP([ii, i;]) (respectively, Decomp ( [siicc('L'), w])) 
consists of A^-humps (respectively, P-humps) only. 

3. Let C and C be two disjoint chains, whose humps are respectively Hi,H2,...,Hk and 
Hk-^i, Hk+2, ■ ■ ■ 1 H£ in order. Then, for some 1 < i < k and k < j < i, the humps of 
CC are 

Hi, H2, ■ ■ ■ , Hi, (i/j+i ■ ■ ■ Hj), Hj^i, . . . ,Hi 

in order. 

The third property implies that 

{end{H) : H £ Decomp(CC")} C {end{H) : H £ Decomp(C)} U {end{H) : H £ Decomp(C")}. 

It will turn out that once we decompose a chain into humps, we need not be concerned with the 
internal structure of these humps. For each hump H we need only store c{H) and h{H). Thus, a 
chain consisting of £ humps can be represented by a length-^ sequence of pairs {c{H), h{H)). We call 
this sequence the hump representation of the chain. Using the third hump-decomposition property, 
one could straightforwardly derive the hump representation of Ci C2 from the hump representation 
of Ci and that of C2. In particular, if we are given Decomp(C) and Decomp(C'), then computing 
Decomp(CC") takes 0(|Decomp(C)| + |Decomp(C")|) time. 

2.2 Hump Clustering 

The following lemma concerns an operation on a schedule called clustering the nodes of a hump. 
Suppose H is a hump of G, and let v be a useful peak of H. Let S" be a schedule of G. If all the 
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Figure 4: (a) A graph G consists of two chains. The first chain contains an A^-hump followed by 
a P-hump. The second chain contains two P-humps. (b) A schedule for G of height four, (c) 
The schedule obtained from the previous one by clustering the A'^-hump to its useful peak, (d) A 
clustered schedule of G of height two. This one is obtained from the previous schedule by clustering 
every hump, (e) A clustered schedule of G with minimum height. 



nodes of H are consecutive in S, then we say H is clustered in S. If every hump of G is clustered 
in S, then we say the schedule S is clustered. If a hump is not clustered in a schedule, then we 
can modify the schedule to make it so. To cluster the nodes of H to v is to change the positions of 
nodes of H other than v so that all the nodes of H are consecutive, and the order among nodes of 
H is unchanged. An example is shown in Figure ^. 

Lemma 2.1 (See [^) Let G be an acyclic graph with node costs and H be a hump of G. Suppose 
S is a schedule of G. If T is obtained from S by clustering all nodes in H to a useful peak of H , 
then T is a schedule of G and h{T) < h{S). 

An example is shown in Figure The height of the schedule in Figure ^(c) is smaller than that 
of the schedule in Figure ^(b). Two clustered schedules of the graph in Figure |^(a) are shown in 
Figures ^d) and |^(e). It follows from Lemma 2A that there is always an optimal schedule of G 
which is clustered. 



2.3 Standard Order 

A series Si - ■ ■ Sm of subsequences of nodes is in standard order if it satisfies the following properties. 



Standard order properties. 

• The series consists of Sj's with negative costs, followed by 5j's with nonnegative costs; 

• The S'j's with negative costs are in nondecreasing order of height; and the Si's with nonneg- 
ative costs are in nonincreasing order of reverse height. 

If the humps of a chain are Hi, H2, ■ ■ ■ , in order, then the series H1H2 ■ ■ ■ is in standard 
order by the first hump-decomposition property. 

Lemma 2.2 (See [^) Let A, B, Si and S2 be subsequences of nodes. Suppose S = S1ABS2 and 
T = SiBAS2. If the series BA is in standard order, then h{S) > h{T). 
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For example, the sequence in Figure ^d) is a clustered schedule of the graph in Figure §(a). 
Note that the series of the last two humps in the schedule is not in standard order: the reverse height 
of the first hump (zero) is less than that of the second hump (one). The schedule in Figure ^e) 
obtained by exchanging those two clustered humps has height one less than that of the schedule in 
Figure 11(d). 



2.4 Hump Merging 

A schedule of G is in standard form if it is clustered and its series of humps of G is in standard 



order. Let T be any schedule of G in standard form. Recall that by Lemma 2.1 there is always 
an optimal schedule of G which is clustered. The humps of G, while clustered in both T and 5, 
may not be in the same order. However, any two humps of the same chain of G must be in the 
same order in T and in S, else either T or 5 is not a schedule. Take two consecutive humps in 
S that are from different chains and that are not in the same order as in T, and exchange their 
positions. By Lemma |2.2| , the resulting ordering has height no more than S. By a series of such 
exchanges, we eventually obtain T from S. It follows that the height of T is no more than that of 
S, and hence that T is optimal. This argument shows that every schedule in standard form is an 
optimal schedule of G. 

Let / = {Hi, H2, . . . , Hm}, where the series HiH2 - ■ ■ Hm is in standard order. Suppose 
Merge(/) returns a sequence of nodes obtained by concatenating all humps in / into standard 
order. Namely, Merge(/) = H1H2 ■ ■ ■ Hm- Assume for uniqueness that Merge() breaks ties in 
some arbitrary but fixed way. By the above argument we have the following lemma. 

Lemma 2.3 (See p|) The output of 



MeRGE( IJ DECOMP(Cj)) 

l<i<p 



is an optimal schedule of G. 



An example is shown in Figure ^. Since the schedule in Figure ^(e) is clustered and its series of 
humps is in standard order, it is an optimal schedule of the graph in Figure §(a). Abdel-Wahab and 
Kameda showed that MERGE(Ui<j<p DECOMP(Ci)) can be obtained in 0(n logp) time. Note 
that the output of function Merge() may not be unique. Without loss of generality, however, 
we may define Merge() more restrictively as follows to make its output unique for the same G. 
Suppose G is composed of disjoint chains, Ci,G2, ■ ■ ■ ,Cp and / = lJi<i<p DECOMP(Ci). Define 
Merge(/) = H1H2 ■ ■ ■ Hm, where {Hi,H2, . . . , Hm} = I and the series H1H2 ■ ■ ■ Hm is in standard 
order. Furthermore, if HiHj and HjHi are both in standard order, where Cj/ contains Hi, Cj/ 
contains Hj, and i' < j' , then Hi precedes Hj in Merge(/). 



3 Algorithm for Single Pair 

A vector P = {xi,X2, ■ ■ ■ ,Xp) of p nodes is called a cut of G if each Xi is either _L or a node in 
Ci. We call Xi the i-th cutpoint of F. The prefix subgraph G[T] of G is the subgraph Ui<i<p[~5^j]- 
Therefore, the problem we address can be reduced to finding a cut such that the valid schedule of 
the prefix subgraph determined by the cut has the minimal maximum cumulative cost. Let h be 
the maximum cumulative cost of the optimal subschedule that contains v and w. If h is zero, then 
a valid subschedule exists (i.e., the optimal valid subschedule.) If h is positive, then there is no 
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valid subschedule because the maximum cumulative cost of any valid subschedule is greater than 
or equal to h and is thus positive, too. The rest of the section shows that a best cut can be found 
in linear time. 

Since we will frequently encounter two cuts that differ at only one cutpoint, let NEWCUT(r, i, u) 
denote a cut V with 



A j-schedule of G[r] is a schedule of G\T] whose last node is T{j). We use hj(G\r]) to denote the 
height of an optimal j-schedule of G[r]. Suppose T{j) / _L. One can compute hj{G[r]) for a given 
r as follows. Let V = NEWCVT{r,j, pred(T{j))). Clearly, if S is an optimal schedule of G[r'], then 
5r(j) is an optimal j-schedule of G[T]. It follows that 



Note that /i(G[r]) and hj{G[T]) are both nonnegative. We use v ^ w to signify that there is a 
valid subschedule of G in which v precedes w. Let v -/^ w signify that v ^ w \s not true. Note that 
neither — > nor 7^ is a partial order. 

3.1 Basic Idea 

Every valid subschedule of G is a valid schedule of a prefix subgraph G[r] for some cut T of G. 
Therefore, v ^ w \i and only if there is a cut F of G such that G\r\ has a valid schedule in 
which V precedes w. Let h* be the minimum of /i(G[r] U {vw}) over all G[r]'s that contain v 
and w. It follows that v — > if and only if h* = 0. Hence, the problem of determining whether 
V ^ w is reduced to computing the minimum height of a set of chain graphs each augmented with 
an interchain arc. Clearly, two immediate questions arise. 1) How do we compute the height of 
G[r] U {vw}, which is not even serial-parallel? 2) How do we cope with the fact that there could 
be exponential number of prefix subgraphs that contain v and 

Let V and w be contained in two disjoint chains Cj and Cj, respectively. The following ob- 
servation will ease the situation. Suppose 5 is a subschedule of G containing w. Let S' be the 
subschedule of G obtained from S by discarding all nodes succeeding w in S. Clearly, h{S') < h{S). 
Therefore, without loss of generality the minimum of /i(G[r] U {vw}) can be computed over only 
cuts r with r(j) = w. Moreover, we can let w always be the last node of a subschedule by consid- 
ering only the minimum- height j-schedule of each G[T] that contains v. The first question above 
is no longer an issue. 

It turns out that the second question is not an issue, either. We will show that in order to 
obtain the minimum-height of all those j-schedules, it suffices to consider only O {\/n) cuts. In 
particular each of those 0{^/n) cuts is uniquely determined by its j-th cutpoint. 

3.2 The Algorithm 

The algorithm takes v and w as inputs. Let Cj contain v and Gj contain w. The algorithm proceeds 
iteratively with different cutpoint T(i) such that T{i) does not precede v. In each iteration the 
algorithm calls the function Best() to obtain a minimum-height j-schedule for G[T] over all cuts 
r with the designated cutpoints in Cj and Gj . By comparing the heights of these j-schedules with 
respect to different r(i)'s, the algorithm outputs the minimum height of j-schedules for G[r] over 
all r such that r(j) = w and T{i) does not precede v. In Figure ^ we give the algorithm to compute 
/i(G[F*] U {vw}), where F* is a best cut of G corresponding to vw. 




hj{G[r]) 



max{/i(G[F']),c(G[F']) + Mr(j))}. 
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Function MlNHEiGHT(f , w) 

1 Ci := the chain containing v; 

2 Cj := the chain containing w; 

3 T{j):=w; 

4 h* := oo; 

5 Jo := {u} U DECOMP([s'ucc(f), -]); 

6 For every T{i) € {end{H) : H G Iq} do 

7 5*:=BEST(j,{i,j},r); 

8 h*:=mm{h*,h{S*)}; 

9 Return /i*; 



Function BEST(j, F, F) 

1 :=UfceWi°'^COMP([-,r(A:)]); 

2 J := DECOMP([-,pre(i(F(j))]); 

3 K := Ufc0FDECOMP(Cfc); 

4 si := max{/i(i?) : H e I U J, c{H) < 0}; 

5 5+:=MERGE({i7 G/U J:c(//) >0}); 

6 S2 :=M5+F(i)); 

7 s := maxjsi, S2}; 

8 Ks:={H e K : h{H) < s, c{H) < 0}; 

9 Ss := Merge{IU JU Ks); 

10 Return SsTij); 



Figure 5: The algorithm for computing /i(G[F*] U {vw}) for a best cut F* of G corresponding to 
vw. 

Function Best() is the essential part of the algorithm. Based on the given subset F of 
{1,2, ... ,p} and the given cut F, it looks for a best cut F* corresponding to vw such that F*(A;) = 
T{k) for every k £ F. (In the case that we are interested, F = An optimal j-schedule 

of G[F*] is then returned. Note that for every k ^ F, T*{k) depends on a value s, which is the 
maximum of si and S2- Each of si and S2 is determined simply by chains with indices in F and their 
designated cutpoints. Namely, the choices of F*(/c)'s for different k ^ F are mutually independent. 
This is the key to our efficient algorithm. 

In Best(), we do not explicitly specify cutpoints of F*. Instead, we work on hump representation 
of subchains and every outpoint is implicitly specified by an end{H) for some hump H. Specifically, 
Step 1 ensures T*{k) = F(/c) for every k £ F, k ^ j . Steps 3 and 8 ensure F*(A;) = end{H), where 
H is the highest A^-hump of all with h{H) < s and k ^ F. Since we are considering j-schedules, 
F*(j) is specified slightly differently. Although in Step 2 the subchain of Cj is only up to pred(T(j)), 
F*(j) is still F(j), since j-schedule S*T{j) is returned in Step 10. 

3.3 Correctness 

We answer the following two questions in this subsection: 

1. Why is it sufficient to try for F(i) only those nodes in {end{H) : H G /q}? 

2. Why does Best(j, F) return an optimal j-schedule of G[F*] with F*(A;) = F(A;) for every 



Lemma 3.1 Let T be a cut of G. Suppose [x,z\ is a subchain of G containing F(i). Let H be the 
hump of [x,z] containing F(i). Let y be the first valley of [pred{H),T{i)]. If 



k£F7 




r{k) tfk^i; 

pred{H) if k = i and y = pred{H); 

end{H) if k = i and y ^ pred{H), 



then /ij(G[Fi]) < /ij(G'[F]). 
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Proof. Straightforward. □ 

Note that the pred{H) in the above lemma is always an end{H') for some hump H' in Jq) which 
is defined in Step 5 of MinHeight(). Therefore, Lemma answers the first question. 

By definitions of /, J, and Kg it is not difficult to see that the sequence returned by Best(j, T) 
is an optimal j-schedule of G[r*] for some cut T* such that r*(A;) = T{k) for every k £ F. The 
correctness of MinHeight() thus relies on the following lemma, which answers the second question. 

Lemma 3.2 Let T be a cut. Let F be a subset of {1,2, . . . ,p} containing j. If S* = Best(j, F, T), 
then h{S*) < hj{G[T]). 



The rest of the subsection proves Lemma 3.2. Let Fi = {1, . . . , ^ — 1, ^ + 1, . . . ,p}. The following 



lemma is a special case of Lemma p.2| , in which F is composed oi p — I numbers. 
Lemma 3.3 Let F be a cut. If S* = BEST{j,Fi,r) for some £ / j, then h{S*) < hj{G[T]). 
Proof. Define Fi by 



T{k) lik^t, 
the first valley of [-,T{1)] ii k = i. 



Then it is not difficult to see hj{G[Ti]) < hj{G[T]). Let F' be the cut with h{S*) = hj{G[T']), i.e., 
S* is a j-schedule of G[F']. By definition of Best(), F' and Fi could differ only at the ^-th position. 
Clearly, it suffices to show hj{G[T']) < hj{G[Ti]). 

Let w = Ti{j). Let L = Decomp([-, Fi(/c)]). Define 

S = Merge(/U JU L), 

where / and J are defined in Steps 1 and 2 of Best(). Clearly, Sw is an optimal j-schedule of G[Fi]. 
Thus, h{Sw) = hj{G[Ti]). By choice of Fi(^), L contains no P-hump. Hence, by the uniqueness 
assumption of Merge(), we could write Sw = SiS~^w, where is defined in Step 5 of Best(). We 
prove hj{G[r']) < /ij(G[Fi]) by showing that r'{£) succeeds F(£) if and only if hj{G[r']) < h{Sw) 
as follows. 

Case 1: T'{i) succeeds T{£). Since L contains no P-hump, each hump of [— ,Fi(£)] appears 
in Si. Therefore, SiS'S^w is a j-schedule of G[F'], where S' = [succ{Ti{i)),T' {£)]. We show 
h{SiS'S+w) < h{SiS+w). Now h{SiS'S+w) = max{h{Si),c{Si) + h{S'),c{SiS') + h{S+w)}. 
Clearly, 

h{Si) < h{SiS+w). (1) 

By definition of F, the Kg defined in Step 8 of Best() is composed of the A^- humps of Ci that 
have heights less than s. Therefore, by choice of F'(^) every hump of [— ,F'(£)] has height less 
than s. It follows from the standard order of humps in S' that h{S') < s. By Step 7 of Best(), 
s = max{si,S2}. If s = S2 = h{S^w), as defined in Step 6 of Best(), then c(S'i) -|- h{S') < 
c{Si) + h{S^w). If s = si = h{II*), where H* is a highest iV-hump in / U J, then we could write 
Si = S2H*S^. It follows that 

c{Si) + h{S') = ciS^H* S^) + h{S') 

< c{S2) + h{H*) 

< h{S2H*) 

< h{Si). 
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Therefore, in either case we have 



ciSi) + h{S') <h{SiS+w). (2) 

By choice of r'(^), c(5') < 0. Hence, 

c{SiS') + h{S+w) < c{Si) + h{S+w) 

< h{SiS-^w). (3) 

Combining Equations dTJ), (§), and @, we obtain h{SiS' S~^w) < h{Sw). 



Case 2: r'(^) precedes Ti{£). Let S' = [succ{T'{£)),Ti{£)]. By choice of T'{£), it is not difficult 
to see 

Decomp([-, ri(£)]) = Decomp([-, r'(£)]) u Decomp(5'). 

By choice of ri(^), Decomp(S") contains only A^-humps of heights no less than s. Note that every 
A'^-hump in / U J has height no more than s. By standard form of S, we know that «S" is a suffix 
of Si. Therefore, we could write Sw = S2S'S~^w. Removing S' from Sw, we obtain a j-schedule 
S2S+W of G[r]. We show h{S2S+w) < h{Sw). 

Now h{S2S+w) = max{/i(52),c(52) + h{S+w)}. Clearly, 

h{S2) < h{S2S'S+w) = h{Sw). (4) 

Since each hump of S' has height no less than s, h{S') > s. Hence, h{S'S'^w) > h{S') > s > S2 = 
h{S^w). It follows that 

c{Si) + h{S+w) < c{Si) + h{S'S+w) 

< h{Sw). (5) 

Combining Equations @ and @, we obtain h{SiS~^w) < h{Sw). □ 



Now we are ready to prove Lemma 3.2 



Proof of Lemma |3.2| Recah that 5* = BEST(j, F, L). Let V be the cut such that S* is a 
j-schedule of ^[r']. {S* is certainly an optimal j-schedule of G[r'].) We use the algorithm in 
Figure ^ to prove the lemma. Procedure CutTrans() proceeds with iterations, in which the value 
of £ varies among {1, . . . ,p}. li £ ^ F, then the value of r(^) is updated. Since S is an optimal 
j-schedule of ^[r'], it follows from Lemma that hj{G[T']) < hj{G[r]) always holds during the 
while-loop. If we could show that CutTrans() always terminates, then the lemma is proved. 

Let sj, §2, and s* be the si, S2, and s in the execution of BEST(j, F, F). Let si, S2, and s 
be those in the execution of Best(j, F^, F). The values of si, S2, and s change as the while-loop 
of CutTrans() proceeds. We show that F eventually becomes F' by arguing that s eventually 
becomes s* . 

Since F C Fi, si > si always holds. By definition of Best(), whenever Step 7 of CutTrans() 
is finished, [— ,F(^)] contains only A^-humps. Thus, after the first p iterations of the while-loop, 
[— , F(£)] contains no P-hump for every £ ^ F. Henceforth, S2 = ^2 therefore s = max{si, S2} > 
max{sJ,S2} = s*. If s > s* , then s = si > s* . Since si > s* , there must be an A^-hump H 
in IJ^j^p^ Decomp([— , F(/i;)]) such that h{H) = si. Since s = si, in the next iteration when Ci 
contains H, T{£) will be moved before H by definition of Best(). It follows that the value of s is 
nonincreasing and s will become s*. Once s = s* , in the following p iterations, F(A;) will be moved 
to F'(A;) for every k ^ F. The algorithm then terminates. □ 
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Procedure CuTTRANs(r, T*) 

1 £:=0; 

2 While r* / r do 

3 £ := {£ mod p) + 1; 

4 If ^ F 

5 S ■.= BEST{j,Fe,ry, 

6 r' := the cut such that S is an optimal j-schedule of G[r']; 

7 r:=r'; 



Figure 6: The algorithm transforms F to F*. We prove Lemma |3.2| by showing that this algorithm 
always terminates. 

3.4 Implementation 

Recall that Decomp(C) runs in time linear in \C\, the length of chain C. It follows that the time 
complexity of Steps 1-5 and Step 9 of MinHeight() is 0{n). Suppose the order of nodes assigned 
to T{i) in the for- loop is the same as their order in Q. In the subsection we focus on implementing 
Best() such that the for- loop runs in time 0{n). 



Number of Iterations The following lemma ensures that the size of Iq is 0{^/\Ci\). It follows 
that the number of iterations is 0{^/n). 

Lemma 3.4 Suppose C is a chain with node costs ±1. The number of humps in Decomp(C) is 

o{V\c\)- 

Proof. Since the costs of nodes are either +1 or —1, a hump of height £ contains at least i 
nodes. For the same reason, a hump of reverse height £ contains at least £ nodes. By the first 
hump decomposition property, the heights of the iV-humps in Decomp(C) are different, and so are 
the reverse heights of the P- humps in Decomp(C). If there are ui A^- humps and n2 P-humps in 
Decomp(C), then |C| = 0(nf -|- 722) = G((ni -|- n2)^). This proves the lemma. □ 

Compact Representation of Humps For the sake of efficiency, we do not deal with the internal 
structure of humps in Best(). It suffices to represent each hump H hy a pair {c{H),h{H)) and 
work on the compact representation of humps. Therefore, each of the /, J, and K computed in 
the first three steps is a set of pairs. Clearly, each of these three steps takes 0{n) time. However, 
the contents of J and K do not change in different iterations. Thus, Steps 2 and 3 need only be 
executed once. 

By F = we have / = Decomp([— , F(z)]). Suppose It and Fj are the / and F in the 

t-th iteration for some t > 2. By the order of nodes assigned to F(i), we need not recompute 
Decomp([— , Tt{i)]) from scratch. In the t-th execution of Step 1, [— , Ff(i)] is obtained by appending 
a hump [succ{Tt~i{i)),Tt{i)] to [— ,Ft„i(i)]. By the argument following the hump decomposition 
properties in § |2.1| , the t-th execution of Step 1 takes 0{\lt-i\) time. By Lemma the time 
complexity of all executions of Step 1 is 0{n + yjn x -y/n) = 0(n). 
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Priority Tree To compute si efficiently, we resort to a priority tree, a complete binary tree with 
n + 1 leaves.0 Each leaf keeps two values, count and maxheight. The cost of the {h + l)-st leaf is 
the number of A^-humps of height h m lU J. The maxheight of the + l)-st leaf is (respectively, 
h), if its count is zero (respectively, nonzero). The maxheight of an internal node is the maximum 
maxheight of its children. It follows that the maxheight of the root of a priority tree is the correct 
value of s. The priority tree can be built in time 0{n). Whenever a hump is added to or deleted 
from / U J, the priority tree can be updated in time O(logn). Since J is fixed, to compute si in 
t-th iteration for every t > 2, we add humps in It — It~i to / U J, remove humps in — It from 
/ U J, and update the priority tree. By the third hump decomposition property, we have 



^ \It-It^i\ + \It^i- 




where qi is the number of humps in Cj. Hence, the time complexity of all executions of Step 4 is 
0{n + -y/n X log n) = 0{n). 

Hump Tree To obtain the value of S2 , it is not necessary to know the value of . We need only 
to obtain the height of S~^T{j). Similarly, the actual value of Sg is irrelevant. What we compare 
in Step 8 of MinHeight() is the height of SsT{j). We need a data structure to compute these two 
heights efficiently. 

Let L be a set of humps such that h{H) < n and h{II) < n for every H G L. A hump tree T 
for L is a binary tree composed of two complete binary subtrees. Each subtree has n + 1 leaves. 
Let T/v be the left subtree and Tp be the right subtree. The {h + l)-st leaf of T/v associates with 
the set of A^-humps of height h in L. The {h + l)-st leaf of Tp associates with the set of P-humps 
of reverse height n — h in L. Let T^ be the subtree of T rooted at x. Let Lx be the set of humps 
associated with leaves of T^. Define h{Tx) = h{MERGE{Lx)) and c{Tx) = c(Merge(Lj,)). Clearly, 
when L = IUJ, h{Tp) = h{S+) and c{Tp) = c{S+). When L = I U J U Ks, h{T) = h{Ss) and 
c(T) = c{Ss)- The heights of S^T{j) and SsT{j) can then be computed by 

h{S+V{j)) = max{/i(5+),c(5+) + /i(r(i))}; 
h{Sj:{j)) = max{/i(S,),c(5,) + /i(r(j))}. 

Let us keep h{Tx) and c{Tx) in x for every node x of T. Therefore, the hump tree T takes 0{n) 
space. We show how to compute h[Tx) and c{Tx) for every node x from leaves to root. When x is 
a leaf of T, the humps in Lx have the same height if x is in T^r, and the same reverse height if x is 
in Tp. It is not difficult to see that c{Tx) = Y^h&l^ c(if); and 

f ifL, = 0; 

h{Tx) = I h if X is the {h + l)-st leaf of T/v; 

I c{Tx) — h if X is the {n — h + l)-st leaf of Tp. 

When X is an internal node of T, h{Tx) and c{Tx) can be computed by the information kept in 
the children of x. Suppose y and z are the left and right children of x, respectively. For any H in 
Ly and H' in L^, by the way we associate humps with leaves, the series HH' is in standard order. 
Hence, 

h{Tx) = m.ay.{h{Ty),c{Ty) + h{Tz)}\ 

c{Tx) = c{Ty) + c{T,). 

^Note that there are other ways to implement Step 4 to run in linear time. However, the necessity of priority tree 
will become clear when we address the implementation of the all-pairs algorithm. 
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Procedure REMOVERANGE(r, s) 

1 y := the s-th leaf of T/v; 

2 While y is not the root of T/v do 

3 X := the parent of y; 

4 If y is the left child of x then 

5 ih{T,),ciT,)):=ih{Ty),ciTy)y, 

6 else 

7 Recompute /i(T^.) and c(T^); 

8 y:=2;; 

9 Recompute h{T) and c(r); 



Figure 7: Let T be the hump tree for lU JUK . This procedure acts as if the A^-humps of heights 
no less than s are removed from the hump tree. 

It follows that the hump tree T for L can be built in time 0{n + |L|). 

Once T is built, inserting a hump to L can be done efficiently. Suppose we insert H to L. For 
the case that H is an A^-hump, if = 0, then let h{Tx) = h; otherwise, add c{H) to c{Tx), where 
X is the {h{H) + l)-st leaf of T/y. If is a P-hump, then we add c{H) to both c{Tx) and h{Tx), 
where x is the (n — /i(-ff) + l)-st leaf of Tp. To update T, we simply update the internal nodes on 
the path from x to the root of T. Deleting a hump from L can be done similarly by replacing every 
addition with a subtraction. Clearly, both insertion and deletion take time O(logn). 

To compute the heights of S^T{j) and SsT{j), we need not maintain a hump tree for I U J 
and another hump tree for / U J U Kg. Suppose K~ is the set of A^- humps in K, i.e., K'^ = 
{H £ K : c{H) < 0}. It suffices to maintain a hump tree T for / U J U K~ . Since there is no 
P-hump in K~ , it is still true that h{Tp) = h{S~^) and c{Tp) = c(5"*"). Although the hump tree 
is not for / U J U Kg, the values of h{Ss) and c{Ss) can be efficiently obtained by the procedure 
in Figure |^. Procedure RemoveRange() acts as if the A'^-humps of heights no less than s are 
removed from the hump tree for JU JUK~ . Therefore, the resulting h{T) and c(T) are h{Ss) and 
c{Ss), respectively. Clearly, RemoveRange() takes O(logn) time. Since we maintain the hump 
tree for / U JU K~ in every iteration, we use O(logn) space to keep those modified information 
of T. After obtaining the information we need, we restore the hump tree for / U J U K~ in time 
O(logn). 

Let It be the I in the t-th iteration for any t > 1. To obtain the hump tree for It U JU K~ from 
It-iUJUK~ , we need to insert the humps in It — It-i to T and remove the humps in It-i —It from T. 
Since each insertion and deletion takes O(logn) time, it follows from Equation ^ that the overall 
time complexity for obtaining the hump tree from that of previous iteration is 0{y/n x logn). 
Recall that building a hump tree for L takes 0(n + |L|) time. Since there are n nodes in G, 
|Ii U J U K~ I = 0{n). It follows that the time complexity for building a hump tree for /i U J U K~ 
is 0(n). 

By the above arguments we implement Best() such that the overall time complexity of the 
while-loop in MinHeight() is 0(n). We therefore have the following theorem. 

Theorem 3.5 Suppose G is a graph consisting of p disjoint chains comprising n nodes, where each 
node represents either a P-operation or a V -operation. For any two nodes v and w of G, one can 
determine in 0{n) time whether there is a valid subschedule in which v precedes w. 



13 



Procedure ChainPair(z, j) 

1 {v,w) := {end{Ci), end{Cj)); 

2 Repeat 

3 If w = -L then h := 1; 

4 else h :=MmUElGHT{v,w); 

5 If h > then firstj{v) := succj{w); 

6 := pred{v); 

7 else w := pred{w); 

8 Until u = _L; 



Figure 8: The algorithm that computes firstj{v) for every v G Cj. 

4 Algorithm for All Pairs 

In this section we show how to determine the — > relations for all pairs of nodes in G. The linear- 
time algorithm for a single pair of nodes, applied to all 0{'n?) pairs, takes time O(n^). Fortunately, 
there is a compact representation of this information. To represent this information, it is sufficient 
that we indicate, for each node v, and for each chain C not containing v, the first node ^i; in C 
such that V precedes w in some valid subschedules. This representation has size 0{np), where n is 
the number of nodes and p is the number of chains. The representation can be used to determine 
in constant time whether there is a race between two given operations v and assuming that the 
input p chains are schedulable.^ To determine whether v can precede w, we obtain the first node 
in ly's chain that could be preceded by v in some valid subschedules. If this first node is numbered 
later than w, then v can precede w. Otherwise, v cannot precede w. We therefore consider the 
complexity of constructing such a representation. Clearly, it can be constructed by a sequence of 
calls to the algorithm of Theorem |3.5| . We show how to do much better; in fact the time required 
by our algorithm is only O(logn) times the time required simply to write down the output. 

4.1 The Algorithm 

Let first j[v) denote the first node in Cj that could be preceded by v in some valid subschedule of G. 
The output of the all-pairs algorithm is thus the value of first j(v) for every node v and 1 < j < p- 
Note that first j{v) could be T, which means that none of nodes in Cj can be preceded by v in any 
valid subschedule of G. 

Let us describe first the procedure CHAlNPAlR(i, j) which computes first j[v) for every v £ Ci. 
The all-pairs algorithm simply calls CHAlNPAlR(i, j) for every 1 < i,j < p. For convenience, let 
succj{w) = succ{w) for every w G Gj and let succj{J-) = start{Cj). Procedure ChainPair(z, j) 
is shown in Figure |8|. The algorithm starts with letting v be end{Ci) and letting w be end{Gj). 
The repeat-loop proceeds by replacing w with pred{w). Once MiNHEiGHT(t;, tt;) is not zero, the 
algorithm reports succj{w) as firstj{v). After replacing v with pred{v), the repeat-loop continues 
the same procedure to search for new first j{w). 



''Since the p chains represent a trace of a parallel program, the assumption holds. For arbitrary p chains, one can 
determine whether they are schedulable using the algorithm in [S]. 
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4.2 Correctness 



By induction on v we sliow that ChainPair(z, j) correctly computes first j{v) for every v £ Ci. 

When V = end{Ci), procedure CHAlNPAiR(i, j) keeps replacing w with predj{w) until w = J- 
or MiNHEiGHT(f , ui) > 0. If w = _L, then h{GU {vw'}) = for every w' G Cj. Thus, firstj{v) = 
sucCj{-L) = succj{w) = start{Cj) is correct. If MiNHEiGHT(t;, tj;) > 0, then v -f^ w. It follows 
that V -/^ w' for every w' precedes w in Cj. Since MinHeight(u, sMccj(w)) = 0, v — > succj{w). 
Therefore, succj{w) is the correct value oi first j{v). This confirms the induction basis. 

Suppose the procedure CHAlNPAlR(i, j) correctly reports succj{w) as the value of first j{succi{v)) 
in a certain iteration of the repeat-loop. We need to show that in the remaining iterations first j{v) 
will also be correctly computed. Since succi{v) succj{w), v — > succj{w). It follows that v ^ w' 
(and thus MiNHEiGHT(t>, li;') = 0) for every w' succeeding w in Cj. In other words, to locate the 
first node in Cj that could be preceded by v, it suffices to start testing from w. For the same 
reason as above, CHAlNPAlR(i, j) reports the correct value oi first j[w). The correctness is therefore 
ensured. 

4.3 Implementation 

We show in this subsection how to implement ChainPair(z, j) to run in time 0((|Cj| + \Cj\) logn). 
It then follows that the time complexity of the all-pairs algorithm is 0{np\ogn). 

Suppose each time before we call CHAlNPAlR(i, j), we have the hump tree for ID JU K~ , where 

/ = DECOMP(Cj); 

J = Decomp{[-, pred{end{Cj))]); 

K- = {H € y DECOMP(Cfc) : c{H) < 0}. 

i<fc<p 

It follows from § |3.4| that the first call to MinHeight(?; , u)) can be computed in time O(logn), 
since only one T{i) need be considered. In each of the remaining iterations of the repeat-loop, we 
either replace v with pred{v) or replace w with pred{w). The remaining lemma guarantees that to 
compute each of the following MiNHEiGHT(i;, id), we need only try v as the cutpoint of Cj. 

Lemma 4.1 Consider any iteration of the repeat-loop in CHAiNPAiR(i, j). When the algorithm 
computes h = MlNHElGHT('y, v is the only cutpoint of Ci that could make h zero. 

Proof. By definition of ChainPair(), when computing MinHeight(u, w), firstj[succi{v)) always 
succeeds w in Cj. Assume for a contradiction that u is a node succeeding v in Cj such that there 
is a cut r of C where r(i) = u, V[j) = w, and hj{G[T]) = 0. It follows that u ^ w and thus 
succi{v) — > w. This contradicts the fact that first j{succi{v)) succeeds w in Cj. □ 

Theorem 4.2 Suppose G is as in Theorem \3. 4 The compact representation of the relation "v 
precedes w in some valid subschedules" can be constructed in 0{nplogn) time and 0{n) space. 

Proof. Note that in each iteration of the repeat-loop, either v oi w \s moved by one position. 
Since the costs of v and w are ±1, by the first hump decomposition property the number of humps 
updated in / U J U K~ between two consecutive iterations is a constant. Thus, each execution of 
MiNHEiGHT(t;, takes only time O(logn). Since the number of iterations of the repeat-loop is 
0(|Cj| -|- |Cj|), each execution of ChainPair(z, j) takes time 

0((|C,| + |Cj|) xlogn). (7) 
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It remains to show how to efficiently build the hump tree for each execution of CHAiNPAiR(i, j). 
The very first hump tree can be constructed in time 

0(n). (8) 

Consider the moment when CHAiNPAiR(i, j) is just finished and the all-pairs algorithm is about 
to call CHAiNPAiR(ii, ji). Since all humps in / U J have been deleted during the execution of 
CHAiNPAiR(i, j), the current T is the hump tree for the A^-humps in Ui<fc<p;fc^i,j DECOMP(Cfc). In 
order to obtain the hump tree for CHAiNPAiR(ii, ji), we have to add the A'^-humps in DECOMP(Cj)U 
DECOMP(Cj), delete the A^-humps in DECOMP(CjJ from T, and then insert the humps in 

{H G DECOMP(CiJ : c{H) > 0} U Decomp{[-, pred{end{CjX]) 

to T. The hump decomposition can be done in time 

o{\a\ + \Cj\ + \a,\ + \Cj,\). (9) 

The insertion and deletion of humps can be done in time 

By (0), (P), (^), and ([lO|), the overall time complexity of the all-pairs algorithm is 

Oin)+ E (o{\Ci\ + \Cj\)+o(^\ + ^f\C~\] X log n + 0{\Ci\ + \Cj\) X log n 
i<«,i<p 

which is 0{nplogn). □ 

5 NP-completeness 

In this section we sketch the proof for the following theorem. 

Theorem 5.1 The race- condition detection problem for a parallel program that uses more than 
one semaphore is NP-complete. 

The proof is by reduction from the NP-complete uniform-cost SMMCC problem, where the node 
costs are restricted to ±1 |6|. The reduction has three steps. Given a SMMCC problem for a 
uniform-cost graph Gq of n nodes, we construct O(logn) chain graphs with n + 2 semaphores. The 
first step of the reduction shows that the SMMCC problem for Go can be reduced to determining 
whether each of those O(logn) chain graphs has a valid schedule. The second step shows that each 
of those Oilogn) chain graphs can be simulated by a chain graph with only two semaphores. In 
other words, the simulated chain graph has a valid schedule if and only if the simulating chain graph 
has a valid schedule. The last step shows that the simulating chain graph has a valid schedule if 
and only if f — > t/;, for some v and w, in the same chain graph. We elaborate the details of the 
reduction in the appendix. 
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A Appendix 

Let G be a chain graph. Each node of G is an operation on a semaphore. An operation on 
semaphore S is either +S, incrementing the value of S by one, or —S, decrementing the value of S 
by one. A subschedule of G is valid if the value of each semaphore is always nonpositive during the 
execution of the subschedule. Let v and w be two nodes of G. If there exists a subschedule of G 
in which v precedes w, then we say v ^ w. Clearly, determining whether v ^ w is in NP. If G is 
allowed to use more than one semaphore, then we prove the NP-hardness by a three-step reduction 
from the uniform-cost SMMCC problem. 

A.l First Step 

Let Go be an acyclic directed graph of n nodes, vi,V2, ■ ■ ■ , Vn- The cost of each node is either +1 or 
— 1. Suppose we would like to know whether h{Go) < i. We construct a chain graph Gi composed 
of 2n + 2 chains of operations on n + 2 semaphores, and argue that Gi has a valid schedule if and 
only if h{Go) < I. Note that < h{Go) < n. Therefore, h{Go) can be obtained by O(logn) queries 
of whether a chain graph of n + 2 semaphores has a valid schedule. 

Let n"*" be the number of nodes with positive costs. Let n~ be the number of nodes with 
negative costs. Clearly, n'^ — n~ is the sum of node costs of Gq. Let di be the number of outgoing 
arcs of Go from Vi. The n + 2 semaphores for Gi are Si, S2, ■ ■ ■ , Sn, Sa, Sp. Let the 2?i + 2 chains 
of Gi be Gi, . . . ,Gn+i, and G(, . . . , G^+i, all initially empty. We construct Gi from Go by the 
procedure Construct() in Figure which runs in polynomial time. Without loss of generality 
we can assume that i — n'^ + n~, the number in the second-to-last statement of the procedure 
Construct, is nonnegative, since otherwise h{Go) > ^ is immediately concluded. 

An example is shown in Figure |l^. The intuition is as follows. The (only) operation for Sa in 
Gi corresponds to Vi, where the "sign" of Sa reflects the cost of Vi. We use the first n semaphores, 
Si, ... , Sn, to enforce the execution of these n operations for Sa to obey the precedence constraints 
imposed by Go. In Figure for instance, in order to reach the —Sa in G4, we have to unlock the 
+52 (and +^3, +5*5) in the same chain first. Since the only — ^2 is after the +Sa in G2, we know 
the +Sa in G2 must be executed before the —Sa in G4. 
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Construct(Go) 
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For i : = 
For j 



If VjVi 



1 to n do 
= 1 to n do 

is an arc of Gq then 
Append a +5j to Cj. 
If the cost of Vi is +1 then 

Append a +5^ to Cj. 
else (i.e., the cost of is —1) 
Append a —5a to Cj. 
Append a +S'q, and — ^q, to C-. 
Append di copies of —Si to Cj. 
Append a —Sp to Ci. 
Append n copies of +S/3 to Cn+i- 
Append £ — n+ + n~ copies of +Sa 
Append I copies of —Sa to C^+i. 



to Cn^ 



Figure 9: The procedure constructs a chain graph Gi such that Gi has a vahd schedule if and only 
if h{Go) < L 
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Figure 10: An example for the first step of the reduction. Suppose we would like to determine 
whether /i(Go) < 2, where Gq is the graph on top. We then construct, by Construct, the chain 
graph Gi at bottom. Note that there are one +Sa at the end of Cg and two —Sa in Cg, according 
to the last two statements of Construct. It follows from Lemmas A. 1(1) and A. 2 that that exists 
a valid schedule of the chains at bottom if and only if the height of the graph on top is at most 
two. 
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The — S'/j's at the end of Ci,...,C„ are to ensure that as long as the last -{-Sp in C„+i is 
executed, all operations in Ci, . . . , C,i are already executed. The function of those I copies of —Sa 
in C^_|_i is clear: The larger ^, the easier for Gi to have a valid schedule. The purpose of the 
+5q,, —Sa pairs in C(, . . . , and those l — n'^ + n~ copies of +5q,'s at the end of Cn+i will become 
clear as we proceed. Basically they are used to ensure that Gi has some kind of "pairwise" schedule, 
as long as Gi has a valid schedule. One can verify that there are the same number of +5j's and 
—SiS in Gi, for each 1 < i < n + 2. 

For the rest of the subsection, we prove that h^Go) < ^ if and only if Gi has a valid schedule. 
An implication of the following proofs is that Gi has a valid schedule if and only if it has a valid 
schedule executable by some procedure Pairwise, which will be given in the proofs. 

Lemma A.l 1. If Gi has a valid subschedule containing the last +Sa of Gn+i, then h(Go) < i. 

2. If Gi has a valid schedule, then h{Go) < i. 

Proof. Clearly, it suffices to prove the first statement, since the second statement follows immedi- 
ately from the first statement. 

Let X be a valid subschedule of Gi as described in the lemma. We show h{Go) < L Let Oj be 
the operation of Sa in Gi. Since X is valid and contains the last +Sa of Cn+i, X must contain all 
the operations in Ci, . . . , C„. Therefore, every Oj, 1 < i < n, is in X. 

Suppose the order of those Oj's in X is O^^, Ofcj' • • • i ^kn- By the definition of Construct, 
if Vj is reachable from Vi in Go, then Oj does not precede Oj in X. It follows that the sequence 
Y = Vk^Vf^^ ■ ■ ■ is a schedule of Gq. Therefore, it suffices to show h{Y) < I. 

Assume h{Y) > (. for a contradiction. If we count only those Oj's as the operations for Sa in 
X, then the maximum value of Sa would be greater than i during the execution of X. Note that 
there are ^ + n~ other — S^'s in Gl, . . . , G^+i, which are the only hope for bringing the maximum 
value of Sa down to zero. By the construction of G(, . . . , G^, however, we know n~ of those — S'a's 
have to be preceded in X by n other -|-S'a's. It follows that even if we count all operations for Sa 
together, the maximum value of Sa would be greater than zero during the execution of X. This 
contradicts the fact that X is a valid schedule of Gi. □ 

Lemma A. 2 If h{Go) < i, then Gi has a valid schedule. 

Proof. Let Y = ffc^f^j ■ ■ ■ be a schedule of Go with h{Y) < I. Let be the sum of costs of 
, . . . jVk^. Clearly, m„ = n"*" — n~, which is the sum of node costs of Gq. Since h{Y) < i, we know 
that nii < £ for every 1 < i < n. We claim that Gi can be executed by the procedure Pairwise in 



Figure 11. 



Note that in the schedule of Gi executed by Pairwise, each operation —Si is immediately 
followed by an operation +Si. Not every chain graph has such a "pairwise" schedule, however, 
we show that Gi does. We first show that the first for- loop of Pairwise can be finished for Gi. 
Specifically, suppose the following claim hold: 

Claim For each 1 < i < n, the i-th iteration of the first for-loop of PAIRWISE can be 
executed for Gi. Furthermore, after executing the i-th iteration, 

• the remaining operations in G^. are (i^. copies of — 5^. 's followed by a +S/^; and 

• there are £ — rui copies of — S'a's available in G(, . . . , G^ , ^. 



20 



Procedure Pairwise 
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For : — /c^, A^25 ■ ■ ■ ; do 
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Execute a — S'q, in C(, . . . , C^+j^. 
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Execute a +5^ in Cn+i- 



Figure 11: Procedure Pairwise. 



It is then not hard to see that after the execution of the first for-loop of Pairwise, the remaining 
operation in each Ci is a —Sj3. Therefore, the second for-loop of Pairwise can be finished, since 
there are n copies of +5/3 's available in Cn+i- 

By Lemma |A.l| , we know that after executing the first For-loop, the number of —S^s in 
C{, . . . , C4_|_i is i — nin, which is equal to the number of +5'q,'s at the end of Cn+i- Therefore, 
the last for-loop of Pairwise can be finished. The lemma is proved. 

It remains to prove the above claim by induction on i. For convenience we abbreviate ki to k 
for the rest of the proof. When i = 1, we know v^. does not have any incoming arcs from other 
nodes. Therefore, the for-loop with index j in the first iteration does not execute any operation. 
We then consider the if-statement. 

• If Ofc = —Sa, then c{vk) = —1, and thus mi = —1. There is a +Sa in C( by the definition 
of Construct. We can execute the else-part of the if-statement without problem. Since the 
second operation in is a —Sa, these two steps increase the number of —S^s available in 
C[,.. . ,C;+i by one. 

• If Ok = +Sa, then c{vk) = 1, and thus mi = 1. Since v^. is the first node in Y, h{Y) is at 
least one, and thus £ > 1. We can therefore execute the then-part of the if-statement without 
problem. The number of — 5q's available in C(, . . . , C^+i is decreased by one. 

Clearly, after executing the first iteration, in which the only executed operation in Ck is Ok, the 
remaining operations in Ck are exactly as that described in the claim. Note that before executing 
the first iteration, the number of available —S^s is i by the definition of Construct. Therefore, 
after executing the first iteration, the number of available —S^s is exactly i — mi. This confirms 
the inductive basis. 
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Let i' be an integer with \ < i' < n. Assume that the claim holds for every \ < i < i! . We 
show it holds for i = i' . Consider the i-th iteration. Note that for every j such that vjVk is an 
arc of Go, Oj must have been executed. By the inductive hypothesis we know those dj copies of 
— 5j's are already available before executing the i-th. iteration. Therefore, the for- loop with index 
j will proceed without problem, since there are exactly dj copies of +5'j's in Gi by the definition 
of Construct. We then consider the if-statement. 

• If Ok = —Sa, then = mj_i — 1. We know there is a +Sa in C^. Thus, the else-part can 
proceed without problem. Since the second operation in C^, is a —Sa, these two steps increase 
the number of available —Sa's in C(, . . . , C^+i by one. 

• If Ok = +Sa, then nii = mj_i + 1. The inductive hypothesis says that the number of —SaS 
available in C[, . . . , C4_|_i is i — mj_i before executing the i-th iteration. That number is at 
least one since i — m,j_i — 1 = i — rrii > 0. Therefore, the then-part of the if-statement can 
be executed without problem. The number of available — 5q's in C(, . . . , C^+i is decreased 
by one. 

Therefore, the i-th iteration can be executed, and thus the remaining operations in Ck are as 
required. 

It follows from the inductive hypothesis that the number of available —Sa in C(, . . . , C^_|_i is 
i — mj_i. By the above case analysis we see that the number is exactly £ — rrii after executing the 
i-th iteration. The claim is proved. □ 

If Gi has a valid schedule, then by Lemma [A.l|(2) we know /i(Go) < L It then follows from the 



proof of Lemma |A.2| that Gi has a valid schedule executable by Pairwise. Therefore, we have the 
following lemma. 

Lemma A. 3 Gi has a valid schedule if and only i/Gi has a valid schedule executable by Pairwise. 
A. 2 Second Step 

In this subsection we show that the Gi constructed in the first step can be simulated by another 
chain graph G2, which uses only two semaphores, Ti and T2. G2 has 2n + 3 chains. The first chain, 
denoted Go, is composed of two — Ti's and two — T2's. The remaining 2n + 2 chains are obtained 
from those of Gi as follows. We replace every operation —Si (and +Si) by a unit —Ut (and +Ui) 
for each 1 < i < n + 2. Each unit, —Ui or +Ui, is a sequence of operations on Ti and T2, as shown 
in Figure 0. We also denote those 2n + 2 chains of G2 by Gi, . . . , G„+i and G( , . . . , G^^^. Clearly, 
G2 can be constructed in polynomial time. 

Note that the sequence of operations in each unit is arranged such that only a —Ui and a +Ui 
can "unlock" each other. To be more specific, suppose each of Ti and T2 has initial value -2, 
which will be the case if the four operations in Go are executed. Consider a graph Uij for some 
f^i,jl^n + 2 composed of two units, —Ui and +Uj, each forms a single chain. One can easily 
verify that Uij has a valid schedule if i = j. Moreover, after executing all the operations of Ua, the 
values of Ti and T2 go back to —2. 

We claim that Gi has a valid schedule if and only if G2 has a valid schedule. The only-if part 



is straightforward. Suppose Gi has a valid schedule. By Lemma A^, Gi has a valid schedule 
executable by Pairwise. Note that we can execute the four operations of Go first, which decrease 
the value of both semaphores down to -2. Clearly, the remaining 2n + 2 chains of units can 
be completely pairwisely executed by following the sequence of corresponding operations in Gi 
executed by Pairwise. Therefore, G2 has a valid schedule. 
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Figure 12: The sequence of operations for a —Ui is at left and that for a +Ui is at right, for any 
1 < i < n + 2. 



It takes some added work to prove the other direction of the above claim. A unit is active if 
its third operation is executed. A unit is finished (and thus inactive) if its fifth-to-last operation is 
executed. Suppose G2 has a valid schedule. Consider the sequence of the units of G2 that become 
active in the valid schedule. It follows from the following lemma that the corresponding sequence 
of operations of Gi is a valid schedule of Gi. In fact it is "pairwise", since in the schedule each 
—Si is immediately followed by a +5j. 

Lemma A. 4 Consider the execution of a valid subschedule. 

1. When there is no active unit, the next unit that becomes active must he a —Ui for some 
1 < i < n + 2. 

2. Before that active —Ui is finished, a +Ui must become active. 

3. No unit will become active unless these two active units are finished. 

Proof. At the beginning of the valid schedule, no unit is active. We show the first statement 
of the lemma holds. At this moment there are two — Ti's and two — T2's available (in Co). They 
are our only hope for activating any unit, since each unit is guarded by two +ri's and two +T2's. 
Assume for a contradiction that the first unit becoming active is a +Ui for some 1 < i < n + 2. 
Note that as soon as the first +Ui becomes active, at least two +Ti's are already executed. Since 
at most two — Ti's are executed so far, there is no way to activate any other unit. The execution 
thus cannot proceed. 

When the first unit —Ui becomes active, one can see that the second statement of the lemma 
holds by verifying the following. 

• The active —U will not be finished unless another unit becomes active, since otherwise the 
execution will be blocked by some +r2's. 

• The next active unit must be a +Uj for some 1 < j < n + 2, since otherwise the execution 
will be blocked by some +T2's. 
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• if i < j, the execution will be blocked by some +Ti's. If z > j, then the execution will be 
blocked by some +T2's. Therefore, the next active unit must be a 

When those two units are active, in order to activate other units, we can only hope for the 
— Ti's at the end of the active +Ui. In order to reach those — Ti's, the preceding consecutive +r2's 
must be penetrated. Hence, at least two —T2S at the end of the active —Ui must be executed first. 
Therefore, those two active units —Ui and +Ui must be finished before any other unit becomes 
active. This confirms the third statement of the lemma. 

Note that as soon as the active +Ui is finished (and so must be the active —Ui), the situation is 
exactly the same as the situation at the very beginning of the execution. Namely we have two —Ti's 
and two — r2's available, which are again our only hope for activating any other units. Therefore, 
all the above argument follows inductively. The lemma is proved. □ 



A.3 Third Step 



Let V be the first operation of the Co in G2. Let w be the last operation of the Cn+i in G2. We 
claim that f — > zz; if and only if G2 has a valid schedule, note that v is always the first node in any 
valid subschedule of G2 ■ The if-part of the claim holds trivially. It remains to prove the only-if-part 
of the claim. 

Let X be a valid subschedule of G2 in which v precedes w. Consider the sequence of the units 



of G2 that become active while executing X. It follows from Lemma A. 4 that the corresponding 
sequence of operations of Gi is a valid subschedule of Gi, which definitely contains the last +Sa of 
the Cn+i in Gi. Therefore, Gi has a valid schedule by Lemmas A. 1(2) and A. 2. Finally it follows 



from the claim in §A.2 that G2 has a valid schedule. 
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